A Gentle Introduction to Programming Concepts - Using Python

Introduction

Play along at home

You can follow along and through the notebooks that we will be working through by going to the GitHub repository that we manage our content in.

You can practice and play with code in our playground Jupyter Notebook platform - http://cc-playground.unmrds.net. We routinely reboot and clean out this system so don't do anything here (without downloading what you've done) that you want to keep.

Why learn the basic principles of programming?

  • Thinking algorithmically (a key element in the process used in developing programming solutions) is a powerful problem solving skill that is reinforeced with practice. Practicing programming is great practice.
    • Defining a problem with sufficient specificity that a solution can be effectively developed
    • Defining what the end-product of the process should be
    • Breaking a problem down into smaller components that interact with each other
    • Identifying the objects/data and actions that are needed to meet the requirements of each component
    • Linking components together to solve the defined problem
    • Identifying potential expansion points to reuse the developed capacity for solving related problems

  • Capabilities to streamline and automate routine processes through scripting are ubiquitous

    • Query languages built into existing tools (e.g. Excel, ArcGIS, Word)
    • Specialized languages for specific tasks (e.g. R, Pandoc template language, PHP)
    • General purpose languages for solving many problems (e.g. Bash shell, Perl, Python, C#)
  • Repeatabilty with documentation

  • Scalability
  • Portability

Why Python?

  • It is available as a free and Open Source programming language that can be installed on numerous computer systems, including Windows, Linux and the Mac OS. It can even be editited and run through a web interface such as this Jupyter Notebook.
  • It is a modern programming language that includes many features that make it a very efficient language to both learn programming with and write programs in.
  • It is readable and expressive.
  • It supports a variety of development models including object-oriented, procedural and functional capabilities.
  • It includes a standard library of functions that support significant programming capabilities including:
    • Handling email
    • Interacting with and publishing web and other online resources
    • Connecting with a wide variety of databases
    • Executing operating system commands
    • Developing graphical user interfaces
  • It is relatively easy to start to become productive in Python, though it still takes time and practice to become an expert (as is the case with any programming language).

The primary downside that is mentioned when discussing the choice of Python as a programming language is that as an interpreted language it can execute more slowly than traditional compiled languages such as C or C++.

Can I Play at Home?

There are a variety of ways to run Python on your computer:

  • You may already have a version of Python installed. Many operating systems have a version of Python installed that is used for routine processes within the operating system. You can easily check to see what version of Python might already be on your computer by typing python at the Command Prompt (Windows) or in the Terminal (Mac OS) and seeing what response you get. If Python is installed you will typically see information about the currently installed version and then be taken to the Python command prompt where you can start typing commands.
  • You can install one of the available versions directly from the Python project site: https://www.python.org/downloads/. Following this installation you will be able to execute commands from the interactive command prompt or you can start the IDLE integrated development environment (IDE).
  • You can install a pre-packaged python system such as the Anaconda release of Python (https://www.continuum.io/downloads) that has both Python 2.x and 3.x versions available for download. I prefer this method as it installs a copy of Python that is separate from any previous ones on your system, and allows you to execute the (enhanced) interactive Python command prompt, and run the Jupyter Notebook web-based environment for writing and executing Python code. The examples that we will go through today will be executed in the Jupyter Notebook environment.

Running a Python Environment

Once Python is installed on your computer you have a number of options for how you start up an environment where you can execute Python commands/code.

  1. The most simple method is to just type python at the Command Prompt (Windows) or Terminal (Mac OS and Linux). If you installation was successful you will be taken to the interactive prompt. For example:

     UL0100MAC:~ kbene$ python
     Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42)
     [GCC 4.2.1 (Apple Inc. build 5577)] on darwin
     Type "help", "copyright", "credits" or "license" for more information.
     Anaconda is brought to you by Continuum Analytics.
     Please check out: http://continuum.io/thanks and https://binstar.org
     >>>
  2. If you would like to run the IDLE IDE you should be able to find the executable file in the folder where the Python executable installed on your system.

  3. If you installed the Anaconda release of Python you can type ipython at the Command Prompt (Windows) or Terminal (Mac OS and Linux). If you installation was successful you will be taken to an enhanced (compared with the basic Python prompt) interactive prompt. For example:

     UL0100MAC:~ kbene$ ipython
     Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42)
     Type "copyright", "credits" or "license" for more information.
    
     IPython 3.2.0 -- An enhanced Interactive Python.
     Anaconda is brought to you by Continuum Analytics.
     Please check out: http://continuum.io/thanks and https://anaconda.org
     ?         -> Introduction and overview of IPython's features.
     %quickref -> Quick reference.
     help      -> Python's own help system.
     object?   -> Details about 'object', use 'object??' for extra details.
    
     In [1]:
  4. If you installed the Anaconda release of Python you can type jupyter notebook at the Command Prompt (Windows) or Terminal (Mac OS and Linux). If you installation was successful you should see some startup messages in the terminal window and your browser should open up and display the Jupyter Notebook interface from where you can navigate through your system's folder structure (starting in the folder that you ran the ipython notebook command from), and load existing notebooks or create new ones in which you can enter and execute Python commands. You can also start a local Jupyter Notebook instance through the Anaconda Navigator application that is included with recent releases of the Anaconda Python distribution. In more recent releases of the Anaconda Python distribution you can run the Anaconda Navigator from which you can run Jupyter Notebooks and other applications. *This is the interface that we are using for today's workshop**.

You can experiment with the examples we are using today in your own Jupyter notebook at http://cc-playground.unmrds.net . (password will be provided in the workshop)

Getting Help

There are a number of strategies that you can use for getting help with specific Python commands and syntax. First and foremost you can access the Python documentation which will default to the most recent Python 3.x version that is in production, but from which (in the upper left corner of the page) you can select other Python versions if you are not using the version referenced by the page. Looking at and working through some of the materials in the Python tutorial is also a great way to see the core Python capabilities in action.

In some cases you can find quite a few useful and interesting resources through a resonably crafted Google search: e.g. for python create list.

You can also get targeted help some specific commands or objects from the command prompt by just using the help() function. Where you put the name of the command or object between the parentheses ().

For example:

>>>help(print)

and

>>>help(str)

and

>>>myVar = [1,2,3,4,5]
>>>help(myVar)

Try It Yourself

Type in the help command in a code box in Jupyter Notebook for a few of the following commands/objects and take a look at the information you get:

  • dict - e.g. help(dict)
  • print
  • sorted
  • float

For some commands/functions you need to import the module that that command belongs to. For example:

    import os
    help(os.path)

Try this pair of commands in a code window in your Jupyter Notebook or interactive terminal.


In [1]:
# type your help commands in the box and 
# execute the code in the box by typing shift-enter 
# (hold down the shift key while hitting the enter/return key)

The Basics

At the core of Python (and any programming language) there are some key characteristics of how a program is structured that enable the proper execution of that program. These characteristics include the structure of the code itself, the core data types from which others are built, and core operators that modify objects or create new ones. From these raw materials more complex commands, functions, and modules are built. For guidance on recommended Python structure refer to the Python Style Guide.

Examples: Variables and Data Types

The Interpreter


In [1]:
# The interpreter can be used as a calculator, and can also echo or concatenate strings.

3 + 3


Out[1]:
6

In [2]:
3 * 3


Out[2]:
9

In [3]:
3 ** 3


Out[3]:
27

In [4]:
3 / 2 # classic division - output is a floating point number


Out[4]:
1.5

In [5]:
# Use quotes around strings

'dogs'


Out[5]:
'dogs'

In [6]:
# + operator can be used to concatenate strings

'dogs' + "cats"


Out[6]:
'dogscats'

In [7]:
print('Hello World!')


Hello World!

Try It Yourself

Go to the section 4.4. Numeric Types in the Python 3 documentation at https://docs.python.org/3.4/library/stdtypes.html. The table in that section describes different operators - try some!

What is the difference between the different division operators (/, //, and %)?

Variables

Variables allow us to store values for later use.


In [9]:
a = 5
b = 10
a + b


Out[9]:
15

Variables can be reassigned:


In [10]:
b = 38764289.1097
a + b


Out[10]:
38764294.1097

The ability to reassign variable values becomes important when iterating through groups of objects for batch processing or other purposes. In the example below, the value of b is dynamically updated every time the while loop is executed:


In [11]:
a = 5
b = 10
while b > a:
    print("b="+str(b))
    b = b-1


b=10
b=9
b=8
b=7
b=6

Variable data types can be inferred, so Python does not require us to declare the data type of a variable on assignment.


In [12]:
a = 5
type(a)


Out[12]:
int

is equivalent to


In [13]:
a = int(5)
type(a)


Out[13]:
int

In [14]:
c = 'dogs'
print(type(c))

c = str('dogs')
print(type(c))


<class 'str'>
<class 'str'>

There are cases when we may want to declare the data type, for example to assign a different data type from the default that will be inferred. Concatenating strings provides a good example.


In [15]:
customer = 'Carol'
pizzas = 2
print(customer + ' ordered ' + pizzas + ' pizzas.')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-57638a3a162b> in <module>
      1 customer = 'Carol'
      2 pizzas = 2
----> 3 print(customer + ' ordered ' + pizzas + ' pizzas.')

TypeError: must be str, not int

Above, Python has inferred the type of the variable pizza to be an integer. Since strings can only be concatenated with other strings, our print statement generates an error. There are two ways we can resolve the error:

  1. Declare the pizzas variable as type string (str) on assignment or
  2. Re-cast the pizzas variable as a string within the print statement.

In [16]:
customer = 'Carol'
pizzas = str(2)
print(customer + ' ordered ' + pizzas + ' pizzas.')


Carol ordered 2 pizzas.

In [17]:
customer = 'Carol'
pizzas = 2
print(customer + ' ordered ' + str(pizzas) + ' pizzas.')


Carol ordered 2 pizzas.

Given the following variable assignments:

x = 12
y = str(14)
z = donuts

Predict the output of the following:

  1. y + z
  2. x + y
  3. x + int(y)
  4. str(x) + y

Check your answers in the interpreter.

Variable Naming Rules

Variable names are case senstive and:

  1. Can only consist of one "word" (no spaces).
  2. Must begin with a letter or underscore character ('_').
  3. Can only use letters, numbers, and the underscore character.

We further recommend using variable names that are meaningful within the context of the script and the research.

Structure

Blocks

The structure of a Python program is pretty simple: Blocks of code are defined using indentation. Code that is at a lower level of indentation is not considerd part of a block. Indentation can be defined using spaces or tabs (spaces are recommended by the style guide), but be consistent (and prepared to defend your choice). As we will see, code blocks define the boundaries of sets of commands that fit within a given section of code. This indentation model for defining blocks of code significantly increases the readabiltiy of Python code.

For example:

>>>a = 5
>>>b = 10
>>>while b > a:
...    print("b="+str(b))
...    b = b-1
>>>print("I'm outside the block")

Comments & Documentation

You can (and should) also include documentation and comments in the code your write - both for yourself, and potential future users (including yourself). Comments are pretty much any content on a line that follows a # symbol (unless it is between quotation marks. For example:

>>># we're going to do some math now
>>>yae = 5                   # the number of votes in favor
>>>nay = 10                  # the number of votes against
>>>proportion = yae / nay    # the proportion of votes in favor
>>>print(proportion)

When you are creating functions or classes (a bit more on what these are in a bit) you can also create what are called doc strings that provide a defined location for content that is used to generate the help() information highlighted above and is also used by other systems for the automatic generation of documentation for packages that contain these doc strings. Creating a doc string is simple - just create a single or multi-line text string (more on this soon) that starts on the first indented line following the start of the definition of the function or class. For example:

>>># we're going to create a documented function and then access the information about the function
>>>def doc_demo(some_text="Ill skewer yer gizzard, ye salty sea bass"):
...    """This function takes the provided text and prints it out in Pirate
...    
...    If a string is not provided for `some_text` a default message will be displayed
...    """
...    out_string = "Ahoy Matey. " + some_text
...    print(out_string)
>>>help(doc_demo)
>>>doc_demo()
>>>doc_demo("Sail ho!")

Standard Objects

Any programming language has at its foundation a collection of types or in Python's terminology objects. The standard objects of Python consist of the following:

  • Numbers - integer, floating point, complex, and multiple-base defined numeric values
  • Strings - immutable strings of characters, numbers, and symbols that are bounded by single- or double-quotes
  • Lists - an ordered collection of objects that is bounded by square-brackets - []. Elements in lists are extracted or referenced by their position in the list. For example, my_list[0] refers to the first item in the list, my_list[5] the sixth, and my_list[-1] to the last item in the list.
  • Dictionaries - an unordered collection of objects that are referenced by keys that allow for referring to those objexts by reference to those keys. Dictionaryies are bounded by curley-brackets - {} with each element of the dictionary consisting of a key (string) and a value (object) separated by a colon :. Elements of a dictionary are extracted or referenced using their keys. for example:

      my_dict = {"key1":"value1", "key2":36, "key3":[1,2,3]}
      my_dict['key1'] returns "value1"
      my_dict['key3'] returns [1,2,3]
  • Tuples - immutable lists that are bounded by parentheses = (). Referencing elements in a tuple is the same as referencing elements in a list above.

  • Files - objects that represent external files on the file system. Programs can interact with (e.g. read, write, append) external files through their representative file objects in the program.
  • Sets - unordered, collections of immutable objects (i.e. ints, floats, strings, and tuples) where membership in the set and uniqueness within the set are defining characteristics of the member objects. Sets are created using the set function on a sequence of objects. A specialized list of operators on sets allow for identifying union, intersection, and difference (among others) between sets.
  • Other core types - Booleans, types, None
  • Program unit types - functions, modules, and classes for example
  • Implementation-related types (not covered in this workshop)

These objects have their own sets of related methods (as we saw in the help() examples above) that enable their creation, and operations upon them.

>>># Fun with types
>>>
>>>this = 12
>>>that = 15
>>>the_other = "27"
>>>my_stuff = [this,that,the_other,["a","b","c",4]]
>>>more_stuff = {
...    "item1": this, 
...    "item2": that, 
...    "item3": the_other, 
...    "item4": my_stuff
...}
>>>this + that
>>>
>>># this won't work ...
>>>this + that + the_other
>>>
>>># ... but this will ...
>>>this + that + int(the_other)
>>>
>>># ...and this too
>>>str(this) + str(that) + the_other

Lists

https://docs.python.org/3/library/stdtypes.html?highlight=lists#list

Lists are a type of collection in Python. Lists allow us to store sequences of items that are typically but not always similar. All of the following lists are legal in Python:


In [18]:
# Separate list items with commas!

number_list = [1, 2, 3, 4, 5]
string_list = ['apples', 'oranges', 'pears', 'grapes', 'pineapples']
combined_list = [1, 2, 'oranges', 3.14, 'peaches', 'grapes', 99.19876]

# Nested lists - lists of lists - are allowed.

list_of_lists = [[1, 2, 3], ['oranges', 'grapes', 8], [['small list'], ['bigger', 'list', 55], ['url_1', 'url_2']]]

There are multiple ways to create a list:


In [19]:
# Create an empty list

empty_list = []

# As we did above, by using square brackets around a comma-separated sequence of items

new_list = [1, 2, 3]

# Using the type constructor

constructed_list = list('purple')

# Using a list comprehension

result_list = [i for i in range(1, 20)]

We can inspect our lists:


In [20]:
empty_list


Out[20]:
[]

In [21]:
new_list


Out[21]:
[1, 2, 3]

In [22]:
result_list


Out[22]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [23]:
constructed_list


Out[23]:
['p', 'u', 'r', 'p', 'l', 'e']

The above output for typed_list may seem odd. Referring to the documentation, we see that the argument to the type constructor is an iterable, which according to the documentation is "An object capable of returning its members one at a time." In our construtor statement above

# Using the type constructor

constructed_list = list('purple')

the word 'purple' is the object - in this case a word - that when used to construct a list returns its members (individual letters) one at a time.

Compare the outputs below:


In [24]:
constructed_list_int = list(123)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-c2e9f559c5ec> in <module>
----> 1 constructed_list_int = list(123)

TypeError: 'int' object is not iterable

In [25]:
constructed_list_str = list('123')
constructed_list_str


Out[25]:
['1', '2', '3']

Lists in Python are:

  • mutable - the list and list items can be changed
  • ordered - list items keep the same "place" in the list

Ordered here does not mean sorted. The list below is printed with the numbers in the order we added them to the list, not in numeric order:


In [26]:
ordered = [3, 2, 7, 1, 19, 0]
ordered


Out[26]:
[3, 2, 7, 1, 19, 0]

In [27]:
# There is a 'sort' method for sorting list items as needed:

ordered.sort()
ordered


Out[27]:
[0, 1, 2, 3, 7, 19]

Info on additional list methods is available at https://docs.python.org/3/library/stdtypes.html?highlight=lists#mutable-sequence-types

Because lists are ordered, it is possible to access list items by referencing their positions. Note that the position of the first item in a list is 0 (zero), not 1!


In [28]:
string_list = ['apples', 'oranges', 'pears', 'grapes', 'pineapples']

In [29]:
string_list[0]


Out[29]:
'apples'

In [30]:
# We can use positions to 'slice' or selection sections of a list:

string_list[3:]


Out[30]:
['grapes', 'pineapples']

In [31]:
string_list[:3]


Out[31]:
['apples', 'oranges', 'pears']

In [32]:
string_list[1:4]


Out[32]:
['oranges', 'pears', 'grapes']

In [33]:
# If we don't know the position of a list item, we can use the 'index()' method to find out.
# Note that in the case of duplicate list items, this only returns the position of the first one:

string_list.index('pears')


Out[33]:
2

In [34]:
string_list.append('oranges')

In [35]:
string_list


Out[35]:
['apples', 'oranges', 'pears', 'grapes', 'pineapples', 'oranges']

In [36]:
string_list.index('oranges')


Out[36]:
1

In [14]:
# one more time with lists and dictionaries
list_ex1 = my_stuff[0] + my_stuff[1] + int(my_stuff[2])
print(list_ex1)

list_ex2 = (
    str(my_stuff[0]) 
    + str(my_stuff[1]) 
    + my_stuff[2] 
    + my_stuff[3][0]
)
print(list_ex2)

dict_ex1 = (
    more_stuff['item1']
    + more_stuff['item2']
    + int(more_stuff['item3'])
)
print(dict_ex1)

dict_ex2 = (
    str(more_stuff['item1'])
    + str(more_stuff['item2'])
    + more_stuff['item3']
)
print(dict_ex2)


54
121527a
54
121527

In [16]:
# Now try it yourself ...
# print out the phrase "The answer: 42" using the following 
# variables and one or more of your own and the 'print()' function
# (remember spaces are characters as well)

start = "The"
answer = 42

Operators

If objects are the nouns, operators are the verbs of a programming language. We've already seen examples of some operators: assignment with the = operator, arithmetic addition and string concatenation with the + operator, arithmetic division with the / and - operators, and comparison with the > operator. Different object types have different operators that may be used with them. The Python Documentation provides detailed information about the operators and their functions as they relate to the standard object types described above.

Flow Control and Logical Tests

Flow control commands allow for the dynamic execution of parts of the program based upon logical conditions, or processing of objects within an iterable object (like a list or dictionary). Some key flow control commands in python include:

  • while-else loops that continue to run until the termination test is False or a break command is issued within the loop:

      done = False
      i = 0
      while not done:
          i = i+1
          if i > 5: done = True
  • if-elif-else statements defined alternative blocks of code that are executed if a test condition is met:

      do_something = "what?"
      if do_something == "what?":
          print(do_something)
      elif do_something == "where?":
          print("Where are we going?")
      else:
          print("I guess nothing is going to happen")
  • for loops allow for repeated execution of a block of code for each item in a python sequence such as a list or dictionary. For example:

      my_stuff = ['a', 'b', 'c']
      for item in my_stuff:
          print(item)
    
      a
      b
      c

Functions

Functions represent reusable blocks of code that you can reference by name and pass informatin into to customize the exectuion of the function, and receive a response representing the outcome of the defined code in the function.

Putting it all together

An example of reading a data file and doing basic work with it illustrates all of these concepts. This also illustrates the concept of writing a script that combines all of your commands into a file that can be run. eggs.py in this case.

#!/usr/bin/env python

import csv

# create an empty list that will be filled with the rows of data from the CSV as dictionaries
csv_content = []

# open and loop through each line of the csv file to populate our data file
with open('aaj1945_DataS1_Egg_shape_by_species_v2.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    lineNo = 0
    for row in csv_reader:             # process each row of the csv file
        csv_content.append(row)
        if lineNo < 3:                 # print out a few lines of data for our inspection
            print(row)
        lineNo += 1

# create some empty lists that we will fill with values for each column of data
order = []
family = []
species = []
asymmetry = []
ellipticity = []
avglength = []

# for each row of data in our dataset write a set of values into the lists of column values
for item in csv_content:
    order.append(item['\ufeffOrder'])
    family.append(item['Family'])
    species.append(item['Species'])

    # deal with issues 
    try:
        asymmetry.append(float(item['Asymmetry']))
    except:
        asymmetry.append(-9999)

    try:
        ellipticity.append(float(item['Ellipticity']))
    except:
        ellipticity.append(-9999)

    try:
        avglength.append(float(item['AvgLength (cm)']))
    except:
        avglength.append(-9999)

print()
print()

# Calculate and print some statistics
mean_asymmetry = sum(asymmetry)/len(asymmetry)
print("Mean Asymmetry: ", str(mean_asymmetry))
mean_ellipticity = sum(ellipticity)/len(ellipticity)
print("Mean Ellipticity: ", str(mean_ellipticity))
mean_avglength = sum(avglength)/len(avglength)
print("Mean Average Length: ", str(mean_avglength))

# What's wrong with these results? What would you do next to fix the problem?

Going beyond the Standard Library

While Python's Standard Library of modules is very powerful and diverse, you will encounter times when you need functionality that is not included in the base installation of Python. Fear not, there are over 100,000 additional packages that have been developed to extend the capabilities of Python beyond those provided in the default installation. The central repository for Python packages is the Python Package Index that can be browsed on the web, or can be programmatically interacted with using the PIP utility.

Once installed, the functionality of a module (standard or not) is added to a script using the import command.


In [ ]: